Lab Exercise 3: Time Series Analysis For Gold Price Prediction

Submitted by: Hima Mohandas, Shahina Hayat, Robert Jordan, Shawn Kovacs (Group 5)

Introduction

Gold has an allure and a price tag, but some would say, it has no real intrinsic value.  Something about the glint and sparkle of gold has always been appealing to humans.  Gold is one of only two metals that humans have used as currency. And as currency, humans invest in gold as a safe investment to hold value and some seek to make profit by investing in gold at a low price and selling at a higher price.  Therefore, time series forecasting is an important business application of forecasting the near future possibilities. This project is to analyze the historical prices of gold to predict the prices of gold in near future using Yahoo Finance. We intend to implement various models and the performances will be compared.

Assumptions

Some assumptions that were made given the dataset: Data was collected in a non-biased manner. It is possible to predict values based on historical data. The data are stationary. A stationary process has the property that the mean, variance and autocorrelation structure do not change over time. Or at least, the basic assumption is held that averaging and smoothing models is that the time series is locally stationary with a slowly varying mean. Hence, we take a moving (local) average to estimate the current value of the mean and then use that as the forecast for the near future or for very short-term forecasting. OLS regressions with time series data. The assumptions for unbiasedness of beta change; now we only require: (i)  Linearity in parameters (ii)  No perfect collinearity (iii) Zero conditional mean assumption

Ethical ML Framework

  1. This project aims to predict the near future price for gold, many aspects of the ethical ML framework do not directly apply. 
  2. The gold data used here is public domain, and we can assume it was collected in transparent ways. 
  3. The deployment app is designed to be used by clients who invest as a tool to gauge near future prices and as an insight of whether to invest. Therefore, a large segment of the populace likely can’t take advantage of the analysis presented in this report. 
  4. We assume an upper-income population with access to the internet and are not STEM-educated.  If the outcome of this system were to be of more social impact, this would need to be used in more appropriate datasets and expand to many financial vehicles (i.e. stocks, ETFs, mutual funds, etc.), analysis of which could be more advantageous to that population.

Dataset

The data is downloaded from Yahoo finance. It contains historical data for gold prices from January 1, 2008 till date.

Visualize the data

Model 1: Random Forest

Actual and Predicted price are very close.

Check if the data is stationary

The p-value is higher than 0.05 and this indicates that the data is not stationary.

Methods to make data stationary

We discard the Square Root method since it gave p-value greater than 0.05. We decide to proceed with differencing once.

Both AIC and BIC selected p and q values of 0 and 1 respectively.

Model 2: ARIMA with no seasonality

model = SARIMAX(df, order = (p,d,q))

p = number of autoregressive lags d = order of differencing q = number of moving average lags Based on previous findings, the order = (p,d,q) should be:

Difference = 2 p = 0 q = 1 When using ARIMA, the forecasted data is the actual forecated price, not the difference.

Model 2 Evaluation

The analysis above shows that the residuals are correlated and they are not normally distrbuted. This means there is data that the model didn't capture. We need to refit the model. We may not be able to use this model to make forecasts.

Seasonal Time Series

The plots above shows that there is some obvious seasonality.

Model 3: Auto ARIMA model with seasonality (not considered)

Based on the lowest AIC score, the Best Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 0, 7). Auto Arima didn't detect any seasonality and suggested the differencing is ONLY ONCE.

Forecast using Training Data and compare with Test Data

By looking at the plot, the forecasted data showed an upward trend which is aligned with the test data. It correctly predicted that Gold Prices will go up from 2018 - 2021. It is also within the confidence interval, however the interval is really large. It's hard to predict the prices of gold but it's able to predict a general trend over time.

Model 3: Facebook Prophet

By looking at the above plots produced by Prophet, the forecasted data shows a flat line. The forecasted data is not aligned with the test data when the test data shows an upward trend. Prophet does not seem to be as accurate as ARIMA model. The MAE, MSE and RMSE of Prophet is also higher than the results of ARIMA model.

Auto TS

The AutoTS model also selected "auto_SARIMAX" with differencing one as the best one. Hence, we decided to proceed with this method.

Deployment

Gold is a huge financial asset for countries and central banks. It is used by banks as a way to hedge against loans made to their govt and as an indicator of economic health. It can be viewed like a currency. People are physically and emotionally attached to gold in many countries. It has always been a go to investment. This app is designed to be used for predicting gold prices to help with making decisions regarding buying, selling or holding onto the commodity. https://predict-gold-price.herokuapp.com/

Note from the Authors

Disclaimer: Please review this disclaimer carefully and meticulously before utilizing the model hosted/operated by Group 5. The content and material displayed is the intellectual property of Group 5. The information, content and/or principles may not be reused, republished, or reprinted without the formal consent of all members. The intended application for such a model is for informational/educational purposes and is not intended to be used as a substitute for insight/feedback/advice from professionals and/or 3rd party. Use on your own discretion. Although the data presented has undertaken procedures to ensure its completeness, the authors cannot guarantee that errors, mistakes or misinformed information is present.

References:

  1. https://github.com/AutoViML/Auto_TS/tree/master/example_notebooks;
  2. https://github.com/IamMayankThakur/gold-price-analysis/blob/master/gold_price_analysis.ipynb;
  3. https://blog.quantinsti.com/gold-price-prediction-using-machine-learning-python/;
  4. https://finance.yahoo.com/;